Skip to content

Conversation

@ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Oct 28, 2025

Adds arithmetic for GForce as demanded in #3815 but does not add support for blocks in j like d[, j={x<-x; .(min(x))}, by=y].

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.94%. Comparing base (851467f) to head (8129198).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7401      +/-   ##
==========================================
- Coverage   98.96%   98.94%   -0.02%     
==========================================
  Files          87       87              
  Lines       16734    16819      +85     
==========================================
+ Hits        16560    16642      +82     
- Misses        174      177       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

No obvious timing issues in HEAD=modular_gforce
Comparison Plot

Generated via commit 0480ee5

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 3 minutes and 4 seconds
Installing different package versions 44 seconds
Running and plotting the test cases 5 minutes and 9 seconds

@ben-schwen ben-schwen marked this pull request as ready for review November 2, 2025 18:01
@ben-schwen
Copy link
Member Author

I'm also not sure about moving the tests to optimize.Rraw since this feels kind of wrong and not needed after introducing the new levels/optimization parameter to test.

@ben-schwen ben-schwen mentioned this pull request Nov 2, 2025
@ben-schwen
Copy link
Member Author

@MichaelChirico I'm also not 100% convinced about the new optimize.Rraw. I guess the whole idea was that we could simply run the script multiple times with different optimization levels. This need was eliminated by adding the optimize parameter to test() which somehow feels cleaner.

@MichaelChirico
Copy link
Member

@MichaelChirico I'm also not 100% convinced about the new optimize.Rraw. I guess the whole idea was that we could simply run the script multiple times with different optimization levels. This need was eliminated by adding the optimize parameter to test() which somehow feels cleaner.

I see. I still like the idea of a separate script -- the more we peel out of the behemoth tests.Rraw, the better. "eventually" it would be nice to have most tests live in purpose-made test scripts, IMO.

test(2357.2, fread(paste0("file://", f)), DT)
})

# gforce should also work with Map in j #5336
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last idea -- what happens when the grouping column is part of the aggregation in j?

DT[, .(sum(b) - mean(a)), by=b]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the grouping column is part of the aggregation we turn off GForce since it will be in .SDall

data.table/R/data.table.R

Lines 430 to 432 in 8129198

for (ii in seq.int(from=2L, length.out=length(jsub)-1L)) {
if (!.gforce_ok(jsub[[ii]], SDenv$.SDall, envir)) {GForce = FALSE; break}
}

R/data.table.R Outdated
}
}
# Call extracted GForce optimization function
if ( getOption("datatable.optimize")>=1L && (is.call(jsub) || (is.name(jsub) && jsub %chin% c(".SD", ".N"))) ) {
Copy link
Member

@MichaelChirico MichaelChirico Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the latter test into .attempt_optimize? Just a quick escape at the top. I think it will look cleaner, and the checks for .SD/.N feel like they belong there, anyway.'

Edit: I see that it's feeding in to the verbose message below... I think that can also be moved into the helpers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! done.

}

# attempts to optimize j expressions using lapply, GForce, and mean optimizations
.attempt_optimize = function(jsub, jvnames, sdvars, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really like how clean this is 👍

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About halfway done reading the implementation now. Thanks for your patience with the review! I'm really excited for this to get finished :)


# Optimize expressions using GForce (C-level optimizations)
# This function replaces functions like mean() with gmean() for fast C implementations
.optimize_gforce = function(jsub, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that comes to mind seeing such a long signature -- using a "struct" instead of passing individual arguments, e.g.

https://stackoverflow.com/questions/31864162/what-are-the-pros-and-cons-of-using-a-struct-argument-v-s-multiple-parameters

There may be some possibility to make the code easier to understand if some arguments are grouped or combined.

Not a requirement but something to ponder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants